Data Collection and Transliteration of Japanese Spontaneous Database in the Travel Arrangement Task Domain

نویسندگان

  • Akira Kurematsu
  • Youichi Akegami
  • Tanja Schultz
  • Susanne Burger
چکیده

This paper describes the method to construct and transcribe Japanese spontaneous speech data for VERBMOBIL, the German research project of speech translation.. Spontaneous spoken dialogue database is the basis for developing speech and language processing for dialogue systems such as speech translation system. The extended data of human-to-human spoken dialogue in the scenario of travel arrangement has been initiated to be collected in German, English and Japanese in the travel arrangement task. Romanized transcription is used to develop acoustic model and language model in speech recognition system, and natural language translation system. In this paper, issues of transliteration method and several rules and conventions to transcribe Japanese spoken dialogue will be described.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language model selection based on the analysis of Japanese spontaneous speech on travel arrangement task

This paper deals with the issue of language model selection based on the analysis of data collection for spontaneous speech in Japanese in the travel arrangement task which contains five different subtasks. The procedure of transcription and segmentation of the Japanese spontaneous speech in Romanized transcription is described. The use of topic-dependent separated language model were evaluated...

متن کامل

An interlingua based on domain actions for machine translation of task-oriented dialogues

This paper describes an interlingua for spoken language translation that is based on domain actions in the travel planning domain. Domain actions are composed of speech acts (e.g., requestinformation), attributes (e.g., size, price), and objects (e.g., hotel, flight) and can take arguments. Development of the interlingua is guided by a database containing travel dialogues in English, Korean, Ja...

متن کامل

Identification of utterance intention in Japanese spontaneous spoken dialogue by use of prosody and keyword information

This paper describes the study on the identification of utterance intention in Japanese spontaneous dialogue. The procedure of tagging the dialog act which was labeled by hand was evaluated by the analysis of the prosodic information and keyword recognition for the dialogues of scheduling and travel arrangement domains. It was shown that the integration of prosody and keywords relevant to illoc...

متن کامل

Recognition and Transliteration of Proper Nouns in Cross-Language Record Linkage by Constructing Transliterated Word Pairs

Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. To improve the recognition of proper nouns in metadata and obtain their transliterations, we propose a method to construct bilingual transliteration word pairs, in which transliterated words in target language are back-transliterated to their original words in sourc...

متن کامل

Japanese spontaneous speech database with wide regional and age distribution

This paper introduces a Japanese spontaneous speech database of 3,771 speakers with wide regional and age distributions. This database is designed to capture Japanese spontaneous speech characteristics and is used to develop a speaker-independent (SI) speech recognition system. This paper describes the data collection and transcription. Moreover, we show preliminary analyses through SI speech r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999